Multilingual Latent Spaces and Language Interpolation

by Peter de Blanc + ChatGPT Deep Research
Posted to Adarie (www.adarie.com) on April 10, 2025
Content License: Creative Commons CC0 (No Rights Reserved)

Multilingual Neural Models and Language Embeddings: Modern neural NLP models often use multilingual architectures (e.g. multilingual Transformers like mBERT, mT5, mBART, or GPT-style models) that share parameters across languages. In such models, languages can be represented by learned language embeddings or tokens in a common latent space. For example, Google’s multilingual NMT system (Johnson et al., 2017) prepended a special token indicating the target language to each input. The model learned a universal interlingua representation that enabled translating between languages it was never directly trained on (zero-shot translation) ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). Crucially, the language indicator in these models is a continuous vector – meaning we can mathematically manipulate it (not just treat it as a fixed one-hot code). This setup opens the door to interpolating between language embeddings to generate hybrid outputs.

Interpolation Experiments – Mixing Languages in Output: Researchers have explicitly experimented with feeding mixtures of language codes into multilingual models to see what happens. Johnson et al. (2017) conducted a weighted target language experiment: instead of specifying a single target language, they provided a linear combination of two target-language embeddings (e.g. half English, half French) to the decoder ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation) ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). The expectation was that the model might produce some intermediate “blended” language. In practice, the outcome was usually code-switching or a sudden switch from one language to the other around the 50/50 weight mark ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). For instance, with an English↔Japanese/Korean model, when gradually shifting the target embedding from Japanese to Korean, the output started in Japanese and then switched to Korean mid-sentence ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). In some cases the model cleanly output one language up to a point and then the other. Interestingly, they observed that for closely related languages, the interpolation sometimes yielded an actual third language. In one example, interpolating between Russian and Belarusian caused the model to briefly output Ukrainian (a related Slavic language) before fully transitioning to Belarusian ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). This suggests the latent language vectors had a logical arrangement: the “path” in vector space from Russian to Belarusian passed near Ukrainian, so the model “stepped into” that language. For more distantly related languages (like Japanese and Korean, which use different scripts), the model mostly produced abrupt code-switches rather than a smooth blend ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation).

Continuous Latent Representations via Autoencoders: Beyond simple embedding mixing, researchers have used autoencoders and VAEs to learn shared latent spaces for sentences, enabling more controlled interpolation. A variational autoencoder can encode a sentence’s meaning into a continuous vector and decode it in a chosen language. In a cross-lingual VAE, one could encode a thought in language A and decode in language B – effectively translation via a latent space. By interpolating between two latent vectors (one representing an English sentence, another its Spanish translation), one could decode intermediate points to get partially translated sentences. In practice, however, vanilla VAEs struggle with complex code-switched patterns (A Deep Generative Model for Code Switched Text). Research by Gupta et al. (2018) and others introduced specialized models for code-switched generation. For example, a Variational Autoencoder for Code-Switching (VACS) was proposed to better handle the “mixing” structure of bilingual sentences (A Deep Generative Model for Code Switched Text). VACS uses a hierarchical latent code: one part captures the language-switching pattern and another the content. This allowed generating synthetic code-mixed text that more closely follows natural switching grammar (A Deep Generative Model for Code Switched Text) (A Deep Generative Model for Code Switched Text). Such continuous latent models demonstrate that in theory one can smoothly transition between languages in meaning-space, but ensuring the output remains grammatical in both languages is non-trivial.

Quality, Consistency, and Intelligibility: A key question is whether these interpolated or mixed-language outputs make sense. Results so far indicate partially mixed outputs can be produced, but with varying consistency. In Johnson et al.’s weighted embedding experiments, the intelligibility of the output was usually high (the model wasn’t babbling random characters – it stuck to real words in either language), but the outputs were not a new “halfway” language. Rather, the decoder would code-switch: e.g. output a full clause in Japanese then a clause in Korean ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). For closely related languages that share vocabulary or scripts (e.g. Spanish and Portuguese), the model sometimes intermingled words from both languages in one sentence. An example from Johnson et al. shows a Spanish→Portuguese interpolation where the sentence gradually swaps Spanish words for Portuguese ones ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). The intermediate sentences were essentially code-mixed – understandable if you know both languages, though perhaps odd from a monolingual standpoint. In the Russian/Belarusian case, the surprise Ukrainian output was perfectly grammatical Ukrainian ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation), indicating the model actually produced a real-language sentence (just not the one expected!). These findings hint that the latent space captures linguistic relationships, but truly smooth “blends” of languages are hard because the model’s decoder tends to lock onto one language’s grammar at a time. As the authors note, the decoder’s internal language model makes it “very hard to mix words from different languages” once generation is underway ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). In fact, as soon as a few words of one language are produced, the model’s momentum carries it in that language, reducing attention to the initial language token mix ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation).

That said, with explicit training, models can learn to produce controlled mixed-language output. Recent research on code-switched text generation has achieved outputs that are both consistent and high-quality. For example, Tarunesh et al. (2021) fine-tuned a transformer to generate Hindi-English code-switched sentences. They started with Hindi input and trained the decoder to insert English translations for some words, effectively learning when to switch language. Their generated sentences were evaluated as highly natural, in fact comparable to human-written code-switched text in a Hindi-English setting ([2107.06483] From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text). This demonstrates that with the right data, a model can be taught to smoothly alternate languages in a grammatically valid way. Similarly, the VACS approach mentioned above was shown to produce realistic code-mixed sentences that improved downstream language modeling perplexity when used as additional training data (A Deep Generative Model for Code Switched Text). In summary, intelligibility is achievable, but free-form interpolation in latent space often results in hard switches unless the model is explicitly trained for mixed output.

Applications and Educational Uses: The ability to gradually shift between languages has intriguing applications. One practical use-case is in language education. Educators have long used the diglot weave technique – embedding foreign words into native-language sentences and increasing the foreign proportion over time – to immerse learners gradually. Neural models can facilitate this by automatically generating “blended” texts. Imagine taking an English sentence and progressively translating more words into French: a multilingual model could potentially do this by adjusting a “language mixture” parameter or by iterative partial translation. In fact, a real-world example of this approach is Disney’s “Learn Chinese: Toy Story 3” app, which presents a story starting in 100% English and ends in 100% Chinese, incrementally increasing the Chinese content at each level (25%, 50%, 75% etc.) (Disney Publishing Worldwide Launches Digital Language Learning Product | Business Wire). This app was based on the diglot weave method and showed that gradual language mixing can aid comprehension and learning (Disney Publishing Worldwide Launches Digital Language Learning Product | Business Wire). While the Disney system was likely manually crafted, researchers have proposed using AI to automate such transitions. A multilingual model could be used to generate intermediate texts for any content – for example, by translating some fraction of the sentences or nouns and leaving the rest in the native language, then progressively increasing the fraction. This is an active area of interest for tools that scaffold learners from familiar to new languages.

Open-Source Implementations and Demos: If you’re interested in experimenting with language interpolation, there are resources to explore. On the research side, frameworks like OpenNMT or Fairseq allow training multilingual translation models with language tokens; one can then manually mix or weight the language tokens at inference (replicating the Johnson et al. approach). For instance, Johnson’s 2017 paper described how they simply took the learned embeddings for <2ru> (to Russian) and <2be> (to Belarusian) and blended them – a technique one could try with an open-source multilingual transformer ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). As for code-switching generation, there are publicly available datasets and models. Tarunesh et al. (2021) have released their code for Hindi-English code-switched text generation (as part of their paper), and others have published corpora for code-mixed tweets and dialogues that can be used to fine-tune language models. Additionally, there are rule-based tools like Microsoft’s CodeMixed Text Generator on GitHub, which uses linguistic theories to generate grammatically valid code-mixed sentences from parallel corpora (GitHub - microsoft/CodeMixed-Text-Generator: This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.). While not neural, it provides a sandbox to create mixed-language text and could be a starting point to then train a model on that synthetic data.

For a hands-on demo of latent space interpolation, one might use an open multilingual autoencoder. Although not plug-and-play, researchers have built cross-lingual VAEs whose code is sometimes open-sourced – these allow encoding a sentence and then decoding it in another language (or a mix). If you prefer a more direct route, large pre-trained models like mBART-50 (a multilingual sequence-to-sequence model) can be prompted in a way that mixes languages. By inputting a sentence in language A and instructing the model to output language B, or even concatenating multiple language codes in the prompt, you may observe code-switching in the output. Even GPT-3/GPT-4 style models can do this: e.g., asking ChatGPT to retell a paragraph “gradually transitioning from English to Spanish” will yield a mixed paragraph where English words are progressively replaced by Spanish. This showcases that real-world systems are catching up – what began as curious research experiments with latent embeddings is now feasible (and sometimes intentionally used) in applications from data augmentation to language learning.

Key Takeaways: Neural networks do learn an internal representation where languages exist in a shared space, and we can exploit this to generate hybrid language output. Multilingual transformers provide a straightforward way to interpolate between languages by mixing language embeddings ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). The outputs tend to be code-mixed (chunks of each language) rather than a novel “blended” language, due to the model favoring one language’s grammar at a time ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation). With careful training (e.g. specialized VAEs or fine-tuned translation models), the mixed outputs can be made coherent and natural, as evidenced by code-switched text generators that achieve near-human quality ([2107.06483] From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text). This approach has even been applied outside the lab – for example, in educational apps that gradually shift an English text into Mandarin for learners (Disney Publishing Worldwide Launches Digital Language Learning Product | Business Wire). As open-source multilingual models and tools become more advanced, we’re likely to see “language interpolation” used in practice – from creating intermediate-language translations to crafting learning materials that smoothly bridge languages.

Sources:

Johnson et al. 2017 – Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. (See Sec. 6 on mixing languages for experiments with code-switching and weighted target language interpolation) ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation) ([1611.04558] Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation).
Tarunesh et al. 2021 – “From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text.” (Demonstrates fine-tuning a transformer to produce Hindi-English mixed sentences with human-level fluency) ([2107.06483] From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text) ([2107.06483] From Machine Translation to Code-Switching: Generating High-Quality Code-Switched Text).
Chandu et al. 2019 – “A Deep Generative Model for Code-Switched Text” (VACS). (Introduces a variational autoencoder that learns a latent space for code-switching, enabling generation of realistic mixed-language sentences) (A Deep Generative Model for Code Switched Text) (A Deep Generative Model for Code Switched Text).
Disney “Diglot Weave” App (2012) – Learn Chinese: Toy Story 3 press release. (Example of gradually interpolating between languages in content for learning: story transitions from 0% to 100% Chinese in stages) (Disney Publishing Worldwide Launches Digital Language Learning Product | Business Wire) (Disney Publishing Worldwide Launches Digital Language Learning Product | Business Wire).
Microsoft CodeMixing Tool (GitHub, 2020) – CodeMixed-Text-Generator. (Open-source toolkit that generates code-mixed text using linguistic constraints, used here as a real-world system for producing mixed-language data) (GitHub - microsoft/CodeMixed-Text-Generator: This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.).